Pan-cancer characterization of cell-free immune-related miRNA identified as a robust biomarker for cancer diagnosis

Minimally invasive testing is essential for early cancer detection, impacting patient survival rates significantly. Our study aimed to establish a pioneering cell-free immune-related miRNAs (cf-IRmiRNAs) signature for early cancer detection. We analyzed circulating miRNA profiles from 15,832 participants, including individuals with 13 types of cancer and control. The data was randomly divided into training, validation, and test sets (7:2:1), with an additional external test set of 684 participants. In the discovery phase, we identified 100 differentially expressed cf-IRmiRNAs between the malignant and non-malignant, retaining 39 using the least absolute shrinkage and selection operator (LASSO) method. Five machine learning algorithms were adopted to construct cf-IRmiRNAs signature, and the diagnostic classifies based on XGBoost algorithm showed the excellent performance for cancer detection in the validation set (AUC: 0.984, CI: 0.980–0.989), determined through 5-fold cross-validation and grid search. Further evaluation in the test and external test sets confirmed the reliability and efficacy of the classifier (AUC: 0.980 to 1.000). The classifier successfully detected early-stage cancers, particularly lung, prostate, and gastric cancers. It also distinguished between benign and malignant tumors. This study represents the largest and most comprehensive pan-cancer analysis on cf-IRmiRNAs, offering a promising non-invasive diagnostic biomarker for early cancer detection and potential impact on clinical practice. Supplementary Information The online version contains supplementary material available at 10.1186/s12943-023-01915-7.


Main text
Cancer is recognized as a severe public health problem, with increasing morbidity and mortality worldwide [1].Despite therapeutic advancements, the prognosis of cancers remains grim.Early detection is crucial for improved outcomes, but current biomarkers and techniques are inadequate for widespread screening [2,3].Hence, finding practical, minimally invasive approaches for early cancer detection are of great significance.Cell-free miR-NAs (cf-miRNAs) offer promise as liquid biopsy markers due to their stability and abundance [4].Considering inflammatory reactions and biomarkers may precede cancer diagnosis by years, and the immunosuppressive microenvironment resulting from chronic inflammation can contribute to the development and activation of cancer.Over the past decade, various miRNA-based signatures have been developed to diagnose certain cancer types [5][6][7][8][9], however, the limited sample size and incomplete model construction methods hinders their clinical utility.Also, few study focused on the diagnostic performance of immune-related miRNAs.Therefore, we attempted to investigate cell-free immune-related miRNA profiles (cf-IRmiRNAs) between malignancies and non-malignancies, exploring their diagnostic utility.
Pan-Cancer study analyzed 15,832 samples from 13 cancer types and non-malignant individuals with noncoding RNA profiles, including lung cancer, esophageal cancer, gastric cancer, liver cancer, colorectal cancer, breast cancer, prostate cancer, pancreatic cancer, ovarian cancer, bladder cancer, biliary tract caner, sarcoma, and glioma.The workflow and the specific clinical information of these samples are provided in Fig. 1a-b, Fig. S1 and Table S1.A catalog of 1,256 immune miRNAs was curated (Table S2), and probes with a flag value above 3 in more than half of the samples were defined as abundant serum miRNAs (515 miRNAs).The panorama of the candidates cf-IRmiRNAs in malignancies and nonmalignant samples was evaluated through principal component analysis (PCA), which revealed a dramatically different distribution pattern (Fig. 1c).
To identify reliable candidate cf-IRmiRNAs that showed differential representation between malignant and non-malignant controls, differential analysis was performed in the training set, and we identified 100 differentially expressed cf-IRmiRNAs (|logFC| > 1, P value < 0.01) (Table S3).Then, we conducted a detailed presentation and pathway annotation of those differentially expressed miRNAs, and the results of the selected cf-IRmiRNAs were largely correlated with immune pathways, including the PI3K signaling pathway, PD-1 signaling pathway in cancer, and the Wnt signaling pathway (Fig. 1d and Table S4).Using Lasso regression (λ = 0.008), we retained 39 miRNAs, and hierarchical clustering analysis showed distinct expression patterns between malignant and nonmalignant samples based on that (Fig. 1e).
To evaluate the ability of the cf-IRmiRNA signature in distinguishing cancer types, we analyzed the miRNA profiles in each cancer type individually with non-malignant samples.T-distributed stochastic neighbor embedding (TSNE) was used to visualize the differences between cancer types based on differentially expressed cf-IRmiR-NAs in a lower-dimensional space (Fig. 2e).The diagnostic index, calculated with the cf-IRmiRNAs signature showed higher scores in malignant samples than that of the non-malignant ones (Fig. 2f ).Moreover, the diagnostic index showed a high discriminant performance in each cancer type, especially in lung cancer (AUC: 0.998, 95% CI: 0.998-0.999,sensitivity: 0.995, sp: 0.987, positive predictive value (PPV): 0.942, negative predictive value (NPV): 0.999), ESCA (AUC: 0.998, 95% CI: 0.997, 0.999, sensitivity: 0.990, specificity: 0.981, PPV: 0.804, NPV: 0.999), and STAD (AUC: 0.999, 95% CI: 0.998-0.999,sensitivity: 0.992, specificity: 0.990, PPV: 0.984, NPV:0.998) (Fig. 2g and h).Although the positive predictive value (PPV) of cf-IRmiRNAs signature in certain types of cancer was a little weakened, the classifier showed remarkably high negative predictive value (NPV).This meant that the classifier is more applicable for cancer screening, which can maximize the detection of positive cases and reduce delayed cancer diagnosis.Notably, the classifier still exhibited outstanding performance in early-stage cancer detection, especially in lung and gastric carcinoma, with an AUC of 0.990 (Fig. 2i).
Additionally, we verified the potential utility of cf-IRmiRNA signature for distinguishing between benign and malignant lesions within the corresponding organs or tissues.In the same organs or tissues, the diagnostic index of malignancies was significantly higher than that of the benign lesions (Fig. S5a, c, e, g, and i).ROC analysis confirmed its effectiveness in differential diagnosis across various tissues, including mesenchymal tissues, breast, liver, prostate, and ovary, with AUC achieved 0.955, 0.904, 0.999, 0.994, and 0.928, respectively (Fig. S5b, d, f, h, and j).